Towards measuring continuous acoustic feature convergence in unconstrained spoken dialogues
نویسندگان
چکیده
Acoustic/prosodic feature (a/p) convergence has been known to occur both in dialogues between humans, as well as in human-computer interactions. Understanding the form and function of convergence is desirable for developing next generation conversational agents, as this will help increase speech recognition performance and naturalness of synthesized speech. Currently, the underlying mechanisms by which continuous and bi-directional convergence occurs are not well understood. In this study, a direct comparison between time-aligned frames shows significant similarity in acoustic feature variation between the two speakers. The method described (TAMA) constitutes a first step towards a quantitative analysis of a/p convergence.
منابع مشابه
مدلسازی بازشناسی واجی کلمات فارسی
Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underly...
متن کاملRobust numeric recognition in spoken language dialogue
This paper addresses the problem of automatic numeric recognition and understanding in spoken language dialogue. We show that accurate numeric understanding in ̄uent unconstrained speech demands maintaining robustness at several dierent levels of system design, including acoustic, language, understanding and dialogue. We describe a robust system for numeric recognition and present algorithms f...
متن کاملThe Swedish NICE Corpus – Spoken and embodied characters in a c
This article describes the collection and analysis of a Swedish database of spontaneous and unconstrained children–machine dialogues. The Swedish NICE corpus consists of spoken dialogues between children aged 8 to 15 and embodied fairytale characters in a computer game scenario. Compared to previously collected corpora of children’s computer-directed speech, the Swedish NICE corpus contains ext...
متن کاملTowards Emotion Prediction in Spoken Tutoring Dialogues
Human tutors detect and respond to student emotional states, but current machine tutors do not. Our preliminary machine learning experiments involving transcription, emotion annotation and automatic feature extraction from our human-human spoken tutoring corpus indicate that the spoken tutoring system we are developing can be enhanced to automatically predict and adapt to student emotional states.
متن کاملRecognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors
While human tutors respond to both what a student says and to how the student says it, most tutorial dialogue systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of spoken tutoring dialogues, one with a human tutor, and one with a computer tutor. We first annotat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008